Engine remaining useful life prediction model based on R-Vine copula with multi-sensor data

Aeroengine is a highly complex and precise mechanical system. As the heart of an aircraft, it has a crucial impact on the overall life of the aircraft. Engine degradation process is caused by multiple factors, so multi-sensor signals are used for condition monitoring and prognostics of engine performance degradation. Compared with the single sensor signal, the multi-sensor signals can more comprehensively contain the degradation information of the engine and achieve higher prediction accuracy of the remaining useful life (RUL). Therefore, a new method for predicting the RUL of an engine based on R-Vine Copula under multi-sensor data is proposed. Firstly, aiming at the phenomenon that the engine performance parameters change over time, and the performance degradation presents nonlinear characteristics, the nonlinear Wiener process is used to model the degradation process of a single degradation signal. Secondly, the model parameters are estimated in the offline stage to integrate the historical data to obtain the offline parameters of the model. In the online stage, when the real-time data is obtained, the Bayesian method is used to update the model parameters. Then, the R-Vine Copula is used to model the correlation between multi-sensor degradation signals to realize online prediction of the remaining useful life of the engine. Finally, the C-MAPSS dataset is selected to verify the effectiveness of the proposed method. The experimental results show that the proposed method can effectively improve prediction accuracy.


Introduction
As the heart of aircraft, the engine operation state plays a decisive role in aircraft safety. The operating environment of an aeroengine is extremely harsh, often accompanied by high temperature and high pressure, which poses a great challenge to the timeliness of engine status detection and maintenance. With the application of condition monitoring technology, it is becoming more common to use sensors to monitor the real-time status of equipment. Prognostic and health management (PHM) technologies can well monitor the health status of equipment, among which remaining useful life (RUL) prediction, as the core of PHM, is widely concerned by researchers [1,2]. The predictive degradation model uses the degradation signals to predict the RUL of the system. The degradation signal can be derived from the raw sensor signal, sometimes involving some signal conversion. Traditionally, the research on RUL prediction methods can be roughly divided into two categories: model-based methods and data-driven methods. Model-based methods require sufficient knowledge of the failure mechanisms of devices and rely heavily on physical principles and engineering experience [2].
However, detailed physical knowledge of the degradation process of the underlying components is difficult to obtain. In recent years, with the increasing complexity of industrial machines and rapid development of state detection technology, data-driven methods have been popularly used in predictive maintenance problems. One of the popular data-driven methods is stochastic process-based approaches. The degradation process of equipment has uncertainty due to failure mechanism and operating environment, which can be effectively captured by stochastic process. Among them, the Wiener process is widely used because of its good mathematical characteristics, which are suitable for describing both monotonic and non-monotonic degradation processes [1]. The Wiener process has a good performance in modeling a single degradation signal process. However, in actual work, many factors affect the engine degradation process, and the correlation between the factors is unknown. This paper uses R-Vine Copula to build the correlation between degradation signals.
The research of RUL prediction methods has made many results. However, most traditional methods assume that signals are independent of each other or RUL prediction based on a single degradation signal [3,4]. However, in the actual situation, signals are often interrelated. When a single degradation signal is used to describe the degradation process of equipment, the RUL can be predicted accurately only when the degradation signal has a strong correlation with the degradation process. Therefore, the accuracy of the RUL prediction based on a single degradation signal is generally unsatisfactory [5], and it is appropriate to model the correlation between multi-sensor degradation signals to improve the accuracy of RUL prediction. The research of RUL prediction based on the Wiener process and multi-sensor data is a hot topic. Si et al. [6] proposed a more widely used nonlinear Wiener process model for RUL prediction. The location parameter of the measurement error that will occur during the actual degradation process, Si et al. [7] used the maximum likelihood estimation method to solve this problem. Liu et al. [8] used an optimal R-Vine Copula information fusion method for failure probability analysis, considering the correlation of failure modes of multiple monitoring points of the main girder of long-span bridges in service. In recent years, RUL prediction based on deep learning has also been extensively studied by scholars. Zhang et al. [9] used KF-EM-RTS to predict the RUL of the battery under the condition of unlabeled small sample data. Zhang et al. [10] proposed a neural network composed of 1-DCNN and BiGRU to predict RUL, which integrated the spatial and temporal features in the measurement data in parallel. Zhang et al. [11] used the modified Transformer-based IMDSSN to predict RUL, which solved the significant limitation of the convolution size of CNN and LSTM when processing temporal data. Zhang et al. [12] used a variational auto-encoder-long-short-term memory network-local weighted deep sub-domain adaptation network (VLSTM-LWSAN) to predict RUL.
The main contribution of the proposed model is the first introduction of R-Vine Copula to model multi-feature non-single correlation in aeroengine RUL prediction for the first time. R-Vine Copula can capture various tail dependencies in a single model, which makes it more accurate when modeling complex multi-feature dependencies. R-Vine Copula is suitable for modeling such scenarios where the correlation is unknown. R-Vine Copula uses the pair Copula function to decompose the multivariate joint distribution, which provides the possibility to model multiple correlations between multiple features. R-Vine Copula has more choices of decomposing dependence structure than D-Vine Copula and C-Vine Copula [13].
In this paper, several degradation signals are modeled separately by the nonlinear Wiener process; in the offline stage, historical data is integrated to obtain offline parameters; in the online phase of parameter estimation, the online parameters of the model are obtained by combining real-time data and offline parameters with Bayesian rule; use the marginal distribution of R-Vine Copula joint degradation features to obtain the joint distribution of degradation signals, and realize the prediction of RUL based on simulation; finally, a simulated turbofan engine dataset is used to verify the effectiveness of the proposed method.
The remainder of this paper is organized as follows. Section 2 introduces the degradation model based on the nonlinear Wiener process, estimates the parameters of the model and obtains the RUL prediction of a single degradation signal. Section 3 uses R-Vine copula to model the correlation between multi-sensor degradation signals. Section 4 carries out RUL prediction with multi-sensor data. Finally, section 5 verifies the effectiveness of the proposed method through experiments.

Nonlinear Wiener process
Wiener process has good mathematical properties, which can describe both monotone and non-monotone degradation processes. Since the engine performance degradation is nonlinear, the nonlinear Wiener process is used to model the engine performance degradation process, expressed as follows. Let X(t) denote the degradation at time t: where X(t 0 ) represents the initial state; α represents the drift coefficient, reflecting the degradation rate and determining the degradation path; σ b is the diffusion coefficient; μ(t, β) is a nonlinear function of time t, which is used to describe the nonlinear characteristics of engine performance degradation, and if μ(t, β) = μ, then Eq. (1) becomes the conventional linear model; B(t) is the standard Brownian motion, which is used to describe the random fluctuations. Without loss of generality, we suppose X(t 0 ) = 0. In order to reflect the nonlinearity of the model, this paper sets μ(t, β) = βt β− 1 . Considering that the engine is affected by internal and external factors, the degradation process of the engine has great randomness, so this paper takes the drift coefficient α as a random parameter and obeys the normal distribution α ∼ N(μ a , σ 2 a ). The nonlinear Wiener process has been widely used in performance degradation modeling of complex systems such as aeroengines, bearings and batteries [14,15].

RUL prediction method of single degradation signal
According to the concept of first hitting time (FHT), the RUL of the equipment is expressed as the time required from the degradation state X(t k ) to the first reaching the failure threshold F. F can be obtained from Eq. (2) [16].
where L k is the RUL of the system at the current time t k , x i,Ni is the performance degradation of i-th engine at the failure time t. According to literature [17], if unknown parameter are fixed, the PDF of RUL can be formulated as k , x k is the performance degradation of the engine at time t-th. The engine RUL can be depicted in Eq. (4) by calculating the expectation from Eq. (3).

Offline parameter estimation
The offline parameters are estimated from historical data on engine performance degradation. Assuming that there are N engines, the degradation data of the i-th engine is, x n,1 , x n,2 , …, x n,mn respectively obtained at time t n,1 , t n,2 , …, t n,mn . Therefore, the degradation path of n-th item at j-th time point t j n is, from Eq. (1), given by x n (t n,mn )) T represents a collection of performance degradation data for all engines, T n = (T n,1 , T n,2 , …, T n,mn ) T , T n,j = t β n,j . According to Eq. (5) and the independent increment properties of B(t), it can be known that X n obeys multidimensional normal distribution whose mean and covariance are depicted in Eqs. (6) and (7) below: Since devices are independent, the log-likelihood function of parameter θ = (μ a , σ 2 a , σ 2 b , β) T under X can be depicted in Eqs. (8)-(10) [18] below: where Eq. (8) is complicated, and it is difficult to obtain an analytical solution. Therefore, assuming that σ 2 a , σ 2 b and β are known, let the first derivative of Eq. (8) equal to 0, and the maximum likelihood estimation result of μ a is Then, the profile likelihood function of σ 2 a , σ 2 b and β with respect to the maximum likelihood estimate of μ a is depicted in Eq. (12) [18] below: The maximum likelihood estimates of σ 2 a , σ 2 b and β are obtained by fminsearch function of MATLAB, and substituted into Eq. (11) to obtain the maximum likelihood estimates of μ a .

Online parameter estimation
In the online phase, assume that when the engine runs to the time t k , the observed data . After the offline parameters are obtained, for the engine in work, the real-time data is obtained through the sensors, and the posteriori parameters of the model, that is, the online parameters of the model, can be obtained by using the Bayesian chain rule.
According to Bayesian chain rule, when offline parameters θ = (μ a , σ 2 a , σ 2 b ) and real-time data X 1:k are known, the posteriori estimation of parameters can be obtained in Eq. (13) [19] below:

Construction of correlation between multi-sensor degradation signals
Assuming that the correlation between signals is known, traditional methods can easily obtain the correlation between them using the correlation matrix. However, in practice, especially in the engine operating environment, the correlation between multi-sensor degradation signals is often unknown. In this case, traditional methods are difficult to apply, and Copula theory can solve this problem. Various Copula functions, such as Gauss Copula, Clayton Copula, Gumbel Copula and Frank Copula, are suitable for describing sequences with specific tail distribution [20]. R-Vine Copula can use different Copula functions to describe the relationship between different signals, which is suitable for situations where there are multi-sensor degradation signals and the correlation between signals is unknown. Therefore, R-Vine Copula is used to describe the relationship between degradation signals in this paper.

Copula theory
In 1959, Skalr introduced the Copula theory to construct multivariate distribution. Copula theory decomposes multivariate distribution into one Copula function and multiple marginal distributions. The properties of variables are described by their marginal distributions, and the correlation between variables is determined by Copula function [21].
Let X = (X 1 , X 2 , ⋯, X d ) be a d-dimensional random vector, the joint distribution function of random variables X 1 , respectively. Then, there is a Copula function C, which satisfies The form of the joint probability density function of Eq. (14) can be depicted in Eq. (15) below: where c represents the density function of Copula; f i (x i ) is the marginal density function of the variable X i .

Correlation between multi-sensor signals in R-Vine copula model
For n variables X 1 ,X 2 ,...,X n , the R-Vine Copula structure consists of n − 1 trees to form T = (T 1 ,T 2 ,...,T n− 1 ), N i and E i represent the vertex set and edge set of the i-th tree, respectively. An example of the R-Vine tree structure with 5 variables is shown in Fig. 1. n-dimensional R-Vine Copula has n − 1 trees T = (T 1 , T 2 , ..., T n− 1 ), and these trees have edge sets E 1 , ..., . E n− 1 , and the density functions of these n variables are shown as follows [12,23]: where u a represents a component of n-dimensional random vector u; u − a represents the (n − 1)-dimensional vector after removing u a from u; θ x,ua|u− a is the parameter of Copula function C x,ua|u− a . This paper uses the vineCopula and Copula packages in R language to build the R-Vine Copula model.

RUL prediction with multi-sensor signals
The RUL of a single degradation signal with a degradation process of X(t) and a threshold of F is L = inf{l : X(t + l) ≥ F|X(t) < F}, which is proposed in the sense of the first hitting time. Because the engine requires high reliability and high safety, if one sensor signal's degradation amount reaches the threshold, the entire system reaches the threshold. In the online phase, assuming that when the system runs to time t k , the observed data of the current time signals is X 1:d,k , the predicted RUL with multi-sensor degradation signals are depicted in Eq. (18) [24] below: However, it is difficult to obtain the analytical solution of Eq. (16). In this paper, a simulation based method is used to achieve RUL prediction. In the simulation method, the degradation data X 1:d, k = (x 1k , x 2k , ..., x dk )   degradation amount of the corresponding degradation signal. As long as the degradation amount of all degradation signals does not reach the threshold, the cycle continues. In order to ensure the reliability and safety of the engine, as long as one degradation signal reaches the threshold, it will exit the cycle and return the results. The simulation based algorithm is shown as follows [24,25].
where Δt is the increment of observation time, that is, Δt = t j − t j− 1 . For the C-MAPSS dataset used in the experiment, all time intervals are equal, that is, Δt = 1.

Experiment
In this section, an aeroengine degradation dataset is employed to demonstrate the effectiveness of the proposed method in this paper and compare the prediction performance with some existed works.
Experiments use the C-MAPSS dataset provided by NASA [26]. The C-MAPSS dataset is widely used to study the engine degradation trend mechanism. The structural diagram of the turbofan engine is illustrated in Fig. 2.
The training set contains the full life data of multiple engines from normal to failure, and the test set includes many incomplete data   that ends before failure, which is used for RUL estimation. Each engine data in the training set or test set consists of 26 multivariate time series, in which the first variable represents the serial number of the engine, the second variable represents the working time in units of cycle (t), the 3rd to 5th variables represent 3 operating parameters (altitude, Mach number, and sea surface temperature) that have a significant impact on engine performance, and the remaining 21 variables represent monitoring data from 21 different sensors. The data set description is shown in Table 1.

Data preprocessing
The prognostic procedure contains data preprocessing, training, and testing. Since some sensor signals have constant values throughout the operation of the engine, they do not related to the engine degradation process, so they are removed. For degradation data, the order of magnitude is reduced without changing the characteristics of the data, and the mean value of the first ten cyclic data sets is subtracted to facilitate model calculation.
Considering the high similarity of data collected by most sensors [27] in order to reduce the computational complexity of the prediction model, 6 representative sensor data were selected for experiments. Since the FD001 and FD003 data sets are collected under the same working conditions, the sensor data numbered 10 is not unnecessary. The selected sensor is shown in Table 2.
Because of the noise and large random fluctuations, the experiment began to filter and smooth the data. Taking the No.1 sensor data of the first engine as an example, FD001, FD002, FD003 and FD004 datasets are processed as shown in Fig. 3 a-d.

RUL prediction of single degradation signal
This article uses a nonlinear Wiener to model the engine degradation process, and the drift coefficient μ(t, β) = βt β− 1 as the index function. Based on Eq. (12), the optimal solution is obtained by fminsearch function of MATLAB, the prior values of σ 2 a,0 , σ 2 b,0 and β are obtained. Then prior value of μ a,0 can be obtained from Eq. (11). The prior parameters of subsets FD001, FD002, FD003 and FD004 are shown in Table 3. Select the last five times of the test set (the time interval is 1) for RUL prediction. The PDF and RUL of the last five times of all sensors are shown in Fig. 4 a-e, also taking the No.1 engine as an example.

RUL prediction of multi-sensor signals based on R-Vine copula
The RUL prediction based on R-Vine Copula is calculated according to the algorithm 1 in Section 4. Also taking FD001 as an example, the R-Vine Copula model node pair Copula function types and parameters to be solved are listed in Table 4, and the R-Vine tree structure is shown in Fig. 5. The numbers 1, 2, 3, 4, and 5 represent Sensors 3,4,7,11,12. The RUL at the last moment of all engines in test set is calculated and compared with the real RUL. Taking FD001 and FD003 as examples, compare the predicted RUL and actual RUL of 100 engines. The result is shown in Fig. 6 a, b.   Fig. 4. PDF and RUL, (a) PDF and RUL of sensor 3; (b) PDF and RUL of sensor 4; (c) PDF and RUL of sensor 7; (d) PDF and RUL of sensor 11; (e) PDF and RUL of sensor 12.

Comparisons with other methods
In this section, compare the method proposed in this paper with other methods. The error between predicted RUL and actual RUL characterizes the prediction performance of the model. The root means square error (RMSE) and score function are adopted as statistical indices to assess the performance of the proposed model, which can be depicted in Eqs. (19)-(21) below: where n is the number of testing engine units and d i means predicted RUL ŷ i minus real RUL y i . It can be seen from the score indicator that it is biased toward the forward prediction and punishes the lagging forecast. Therefore, the lower the score index, the more reliable the model is. Table 5 compares the method proposed in this paper with other methods [28][29][30][31][32]. For datasets FD001 and FD003, the method proposed in this paper performs better on score indicator.

Conclusion
RUL prediction with multi-sensor degradation signals is a more realistic and challenging issue than the cases of a single degradation signal. This work investigated a novel aeroengine RUL prediction method with multi-sensor data.
1) The correlation between different degradation signals and the engine degradation process is different, so using multi-sensor degradation signals to predict engine RUL is appropriate. The RUL prediction model proposed in this paper can make full use of the correlation between different degradation signals.  2) The degradation process of the engine is nonlinear, and the Wiener process cannot accurately describe the degradation process of the engine with nonlinear characteristics, so this paper uses the nonlinear Wiener process to model the degradation process of the engine.
3) The experimental results verify the effectiveness of the proposed method and compare it with other methods. For engines with different degradation processes, the nonlinear Wiener process can well describe engine process with obvious degradation trend, such as the FD001 and FD003 subsets. The proposed method has higher accuracy in the score index, and the forward prediction can make the aircraft safer.
Although the proposed approach can effectively describe the degradation process with multi-sensor data, there are still some  problems that need to be investigated in the future. Because of the different operating conditions and fault modes, the proposed method does not perform very well on the subsets FD002 and FD004. In such cases, deep learning models are often a better choice.
In the future, we would like to explore further novel methods to tackle other systems' remaining useful life prediction problems with multi-sensor data.